minimax value
- North America > United States > Texas > Travis County > Austin (0.04)
- North America > United States > Pennsylvania (0.04)
- North America > United States > New York (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
Minimax Optimal Algorithms for Unconstrained Linear Optimization H. Brendan McMahan Jacob Abernethy
We design and analyze minimax-optimal algorithms for online linear optimization games where the player's choice is unconstrained. The player strives to minimize regret, the difference between his loss and the loss of a post-hoc benchmark strategy. While the standard benchmark is the loss of the best strategy chosen from a bounded comparator set, we consider a very broad range of benchmark functions. The problem is cast as a sequential multi-stage zero-sum game, and we give a thorough analysis of the minimax behavior of the game, providing characterizations for the value of the game, as well as both the player's and the adversary's optimal strategy. We show how these objects can be computed efficiently under certain circumstances, and by selecting an appropriate benchmark, we construct a novel hedging strategy for an unconstrained betting game.
- North America > United States > Washington > King County > Seattle (0.04)
- North America > United States > Pennsylvania (0.04)
- North America > United States > Michigan (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Banking & Finance > Trading (0.48)
- Leisure & Entertainment > Games (0.47)
Incentivizing Exploration with Linear Contexts and Combinatorial Actions
We advance the study of incentivized bandit exploration, in which arm choices are viewed as recommendations and are required to be Bayesian incentive compatible. Recent work has shown under certain independence assumptions that after collecting enough initial samples, the popular Thompson sampling algorithm becomes incentive compatible. We give an analog of this result for linear bandits, where the independence of the prior is replaced by a natural convexity condition. This opens up the possibility of efficient and regret-optimal incentivized exploration in high-dimensional action spaces. In the semibandit model, we also improve the sample complexity for the pre-Thompson sampling phase of initial data collection.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- Europe > Kosovo > District of Gjilan > Kamenica (0.04)
- Europe > Greece > Central Macedonia > Thessaloniki (0.04)
Lookahead Pathology in Monte-Carlo Tree Search
Nguyen, Khoi P. N., Ramanujan, Raghuram
Monte-Carlo Tree Search (MCTS) is an adversarial search paradigm that first found prominence with its success in the domain of computer Go. Early theoretical work established the game-theoretic soundness and convergence bounds for Upper Confidence bounds applied to Trees (UCT), the most popular instantiation of MCTS; however, there remain notable gaps in our understanding of how UCT behaves in practice. In this work, we address one such gap by considering the question of whether UCT can exhibit lookahead pathology -- a paradoxical phenomenon first observed in Minimax search where greater search effort leads to worse decision-making. We introduce a novel family of synthetic games that offer rich modeling possibilities while remaining amenable to mathematical analysis. Our theoretical and experimental results suggest that UCT is indeed susceptible to pathological behavior in a range of games drawn from this family.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- Europe > Germany > Baden-Württemberg > Freiburg (0.04)
- (4 more...)
- Leisure & Entertainment > Games > Chess (0.71)
- Leisure & Entertainment > Games > Go (0.48)
Completeness of Unbounded Best-First Game Algorithms
In this article, we prove the completeness of the following game search algorithms: unbounded best-first minimax with completion and descent with completion, i.e. we show that, with enough time, they find the best game strategy. We then generalize these two algorithms in the context of perfect information multiplayer games. We show that these generalizations are also complete: they find one of the equilibrium points.
Game Tree Search in a Robust Multistage Optimization Framework: Exploiting Pruning Mechanisms
Hartisch, Michael, Lorenz, Ulf
We investigate pruning in search trees of so-called quantified integer linear programs (QIPs). QIPs consist of a set of linear inequalities and a minimax objective function, where some variables are existentially and others are universally quantified. They can be interpreted as two-person zero-sum games between an existential and a universal player on the one hand, or multistage optimization problems under uncertainty on the other hand. Solutions are so-called winning strategies for the existential player that specify how to react on moves of the universal player - i.e. certain assignments of universally quantified variables - to certainly win the game. QIPs can be solved with the help of game tree search that is enhanced with non-chronological back-jumping. We develop and theoretically substantiate pruning techniques based upon (algebraic) properties similar to pruning mechanisms known from linear programming and quantified boolean formulas. The presented Strategic Copy-Pruning mechanism allows to \textit{implicitly} deduce the existence of a strategy in linear time (by static examination of the QIP-matrix) without explicitly traversing the strategy itself. We show that the implementation of our findings can massively speed up the search process.
- North America > United States > New York > New York County > New York City (0.04)
- Europe > Netherlands > Limburg > Maastricht (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- (2 more...)
An Online Learning Approach to Generative Adversarial Networks
Grnarova, Paulina, Levy, Kfir Y., Lucchi, Aurelien, Hofmann, Thomas, Krause, Andreas
We consider the problem of training generative models with a Generative Adversarial Network (GAN). Although GANs can accurately model complex distributions, they are known to be difficult to train due to instabilities caused by a difficult minimax optimization problem. In this paper, we view the problem of training GANs as finding a mixed strategy in a zero-sum game. Building on ideas from online learning we propose a novel training method named Chekhov GAN 1 . On the theory side, we show that our method provably converges to an equilibrium for semi-shallow GAN architectures, i.e. architectures where the discriminator is a one layer network and the generator is arbitrary. On the practical side, we develop an efficient heuristic guided by our theoretical results, which we apply to commonly used deep GAN architectures. On several real world tasks our approach exhibits improved stability and performance compared to standard GAN training.
- Education > Educational Setting > Online (0.61)
- Leisure & Entertainment > Games (0.46)
Sibling Conspiracy Number Search
Pawlewicz, Jakub (University of Warsaw) | Hayward, Ryan B. (University of Alberta)
For some two-player games (e.g. Go), no accurate and inexpensive heuristic is known for evaluating leaves of a search tree. For other games (e.g. chess), a heuristic is known (sum of piece values). For other games (e.g. Hex), only a local heuristic — one that compares children reliably, but non-siblings poorly — is known (cell voltage drop in the Shannon/Anshelevich electric circuit model). In this paper we introduce a search algorithm for a two-player perfect information game with a reasonable local heuristic. Sibling Conspiracy Number Search (SCNS) is an anytime best-first version of Conspiracy Number Search based not on evaluation of leaf states of the search tree, but — for each node — on relative evaluation scores of all children of that node. SCNS refines CNS search value intervals, converging to Proof Number Search. SCNS is a good framework for a game player. We tested SCNS in the domain of Hex, with promising results. We implemented an 11-by-11 SCNS Hex bot, DeepHex. We competed DeepHex against current Hex bot champion MoHex, a Monte-Carlo Tree Search player, and previous Hex bot champion Wolve, an Alpha-Beta Search player. DeepHex widely outperforms Wolve at all time levels, and narrowly outperforms MoHex once time reaches 4min/move.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- North America > Canada > Alberta (0.14)
- Europe > Netherlands > Limburg > Maastricht (0.05)
- (5 more...)
Pruning Game Tree by Rollouts
Huang, Bojun (Microsoft Research)
In this paper we show that the alpha-beta algorithm and its successor MT-SSS*, as two classic minimax search algorithms, can be implemented as rollout algorithms , a generic algorithmic paradigm widely used in many domains. Specifically, we define a family of rollout algorithms, in which the rollout policy is restricted to select successor nodes only from a certain subset of the children list. We show that any rollout policy in this family (either deterministic or randomized) is guaranteed to evaluate the game tree correctly with a finite number of rollouts. Moreover, we identify simple rollout policies in this family that ``implement'' alpha-beta and MT-SSS*. Specifically, given any game tree, the rollout algorithms with these particular policies always visit the same set of leaf nodes in the same order with alpha-beta and MT-SSS*, respectively. Our results suggest that traditional pruning techniques and the recent Monte Carlo Tree Search algorithms, as two competing approaches for game tree evaluation, may be unified under the rollout paradigm.
- North America > United States > Massachusetts > Middlesex County > Reading (0.04)
- Europe > Netherlands > South Holland > Leiden (0.04)
e-Valuate: A Two-player Game on Arithmetic Expressions -- An Update
Aravamuthan, Sarang, Ganguly, Biswajit
e-Valuate is a game on arithmetic expressions. The players have contrasting roles of maximizing and minimizing the given expression. The maximizer proposes values and the minimizer substitutes them for variables of his choice. When the expression is fully instantiated, its value is compared with a certain minimax value that would result if the players played to their optimal strategies. The winner is declared based on this comparison. We use a game tree to represent the state of the game and show how the minimax value can be computed efficiently using backward induction and alpha-beta pruning. The efficacy of alpha-beta pruning depends on the order in which the nodes are evaluated. Further improvements can be obtained by using transposition tables to prevent reevaluation of the same nodes. We propose a heuristic for node ordering. We show how the use of the heuristic and transposition tables lead to improved performance by comparing the number of nodes pruned by each method. We describe some domain-specific variants of this game. The first is a graph theoretic formulation wherein two players share a set of elements of a graph by coloring a related set with each player looking to maximize his share. The set being shared could be either the set of vertices, edges or faces (for a planar graph). An application of this is the sharing of regions enclosed by a planar graph where each player's aim is to maximize the area of his share. Another variant is a tiling game where the players alternately place dominoes on a $8 \times 8$ checkerboard to construct a maximal partial tiling. We show that the size of the tiling $x$ satisfies $22 \le x \le 32$ by proving that any maximal partial tiling requires at least $22$ dominoes.
- North America > United States > New Jersey (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Asia > India > Tamil Nadu > Chennai (0.04)